From construction to deployment of LifeWatchGreece the potentail role of EGI-LW Competence Centre
-
Upload
emmanouella-panteri -
Category
Science
-
view
45 -
download
1
Transcript of From construction to deployment of LifeWatchGreece the potentail role of EGI-LW Competence Centre
14/23/2015
by Emmanouela Panteri
Contributors: Christos Arvanitidis, Nicolas Bailly, Sarah Faulwetter,, Jacques Lagnel, George Perantinos, Anastasis Oulas, Panagiotis Vavilis, Kleoniki Keklikoglou, Matina Nikolopoulou, Alexandros Gougousis and about 30 data managers
From construction to deployment of LifeWatchGreece: The potential role of
EGI - LW Competence Centre
24/23/2015
LifeWatchGreece e-Infrastructure
Insert footer here
LifeWatchGreece
LWG e-infrastructure:● Multi-server e-infrastructure
currently deployed in HCMR, Crete● Hosts biodiversity data and
applications Applications:● e-services: searching datasets/ data
or one-shot analyses● vLabs: interfaces for advanced
selection of datasets/data, and more elaborated suites of analyses
series of web tools (vLabs or e-services) for the public
34/23/2015
Application development in 2 steps:● Independent development of a web
application (by the team)● Integration to the infrastructure / portal
Access Control ● Landing page: list of applications● One-time sign-up for accessing all apps● A few applications require more
credentials: the computer-intensive ones
● User Rights management Graphical Interface ● A common graphical interface
frame/wrapper introducing all applications
Accessed by the LifeWatchGreece Portal
LifeWatchGreece
portal.lifewatchgreece.eu
44/23/2015
LWG e-Infrastructure: advantages
LifeWatchGreece
● Applications developed in any programming language (PHP, Java EE, .NET, ...)
● Design, development and maintenance of applications independent from each other: a common standard only for data exchange (DwC, …)
● Each application run in independent execution environment: scalable VMs number if needed with more apps.
● Compartmented security: affected application does not compromise others
● Core developers involved only at integration stage
LifeWatchGreece
54/23/2015
LWG e-Infrastructure: advantages
LifeWatchGreece
● Other integration methods: iframes, integrating graphically commercial apps
● Open source applications integration possible with few adaptations both at access level and graphical level, especially when under MVC architecture
● Moreover, most CMSes can be easily integrated, at least at the access control level
● Certain javascript and CSS frameworks provided by default through libraries in order to enforce the consistency of the user interface throughout the portal
LifeWatchGreece
64/23/2015
LWG Portal diagram
LifeWatchGreece
LWG: Application Layer, Data Layer, Cluster, Communication
74/23/2015
LWG e-Infrastructure: What is missing
LifeWatchGreece
● No user workspace● Currently, files not retrievable from one session to the other, from
one tool to the other.● Could EGI Competence Center provide such functionality?
Workspace development will increase significantly the storage requirements.
● Would require some work between LWG infrastructure and EGI-CC (e.g., space allocation after sign up)
LifeWatchGreece
84/23/2015
Mainly focused on OMICs NGS data analysis:● Transcriptomics (RNA-Seq)● Genomics (Eukaryote and bacterial)● Metagenomics (microbial community)● Metabarcoding● RAD-Sequencing
More than 170 bioinformatics packages covering:● Genomes & transcriptomes de novo assembly● Functional and structural genes annotation● Sequence similarity (parallel BLAST) and mapping● Population genetics● Phylogeny reconstruction● Statistics (250 R packages installed)● Genetic markers mining/analysis
HPC bioinformatics platform
LifeWatchGreece
43 users from 11 institutes in 5 countries (Greece, Italy, France, Norway, Portugal)
More than 8000 jobs submitted during the last month
94/23/2015
● 9 worker nodes● 108 cores, ● 784 GB RAM, ● 30TB storage● 10 Gbps ethernet network
● Gentoo Linux● Resource Manager: Torque/Maui● storage: XFS/NFS● storage users quota
HPC bioinformatics platform upgrade
LifeWatchGreece
● 13 worker nodes● 300 CPU cores● 2.5 TB RAM● 120 TB storage● 40 Gbps Infiniband network
● Centos linux/debian● Resource Manager: SLURM● Storage: Lustre and ZFS/NFS● Storage group/users quota● LXC Virtualization● User management via LDAP
~3x Performance
Software (open source)
Hardware
Languages: GCC, ICC/IFC, R, BioPerl Biopython, ruby, Biojava....parallelization: openMPI, OpenMP and pthreadsDatabase servers: MySQL, PostgreSQL, ...
104/23/2015
Bioinformatic challenges
LifeWatchGreece
RNA-Seq data analysis =>360 MreadsOptimised and parallelised pipeline
Sequence similarity search: parallel BLAST=>10,000 queries
Runtime on the biocluster (h)
Runtime on a PC (1 CPU)
Assemblyrequires~350GB shared RAM
12 (10 threads) >120 h
Annotation
BLAST 96 (94 jobs) >> 3 month
InterPro 32 (48 jobs) 1.5 month
Mapping 4 (12*10 threads) >10 days
Total ~ 6 days >5 months
Nb CPUs 1 94
blastn (nt)
Speedup / Runtime (h)
1.0 / 6.1 days
105.4 / 1.4 h
blastx (nr)
Speedup / Runtime (h)
1.0 / 11.6 days
88.8 / 3.1 h
114/23/2015
eServices and vLabs: the R-vLab
LifeWatchGreece
How can EGI Competence Center help LWG e-infrastructure to increase its
computational power?
● Uses the “R” programming language
● Supports an integrated and optimized online R environment (data manipulation and computational speed-up)
● Allows to overcome severe computational power deficit, e.g.: Calculation on large matrices of several biodiversity indices and of multivariate analyses
124/23/2015
eServices and vLabs: the R-vLab
LifeWatchGreece
~20 fold speed-up
Conventional Mantel compared to Parallel Mantel
134/23/2015
● Micro-computed tomography ● Non-destructive method of 3D x-ray
microscopy● Creation of 3D models of objects
from a series of x-ray projection images
MicroCT offers:● Collection of virtual galleries
of taxa displayed and disseminated● Manipulation of the 3D models through
a series of online tools● Download of datasets for local
manipulations
eServices and vLabs: MicroCT
LifeWatchGreece
How can EGI Competence Center help LWG e-infrastructure for the storage and image manipulation, incl. 3D models?
144/23/2015
MicroCT: current issues
In general:● Potential large increase of the number image galleries especially
from museum specimen collections (several orders of magnitude)● Need for 3D metadata standards: dissemination and searching● Need for 3D data annotations protocols and tools● Need for searching tools over the spread catalogues of galleries
(centralized or distributed)In LWG● MicroCT generates many image files: storage issue● Processing and manipulating images are CPU intensive: computing
issue
LifeWatchGreece
154/23/2015
Harvesting various other repositories such as:● Taxonomic: CoL and PESI (and components: FADA, EMRS,
E+MP), WoRMS, EEA/EUNIS, ...● Occurrences: GBIF, OBIS, ...● Species traits: PolyTraits, FishBase, SeaLifeBase, eModNet, ...● Bibliography: RefBank, BHL, AnimalBase, ...● Citizen Science: iNaturalist, ...● Workflows: BioVel, ...
Install mirror websites: FishBase, RefBank, GNI
Develop Web Services for disseminating LWG data:
● Concerns about performance due to Web services use
LifeWatchGreece
LWG and EGI Competence Center
Processing power and storage requirements
164/23/2015
Linked Data / Linking Open Data
LifeWatchGreeceLifeWatchGreece
LifeWatchGreece principle: make data available to everybody
A number of datasets as RDF under triplestores are ready
Diagram from http://lod-cloud.net/
174/23/2015
LifeWatchGreece
LifeWatchGreece Research Infrastructure , funded by the GSRT (Greek government: structural funds), is the national effort to address the above requirement and to support relevant studies.To materialize its aim, LWG RI adheres to the central lifewatch.eu guidelines, and attempts to ally all the Greek scientific human resources working on biodiversity data and data observatories.Coordinated by the Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC, www.imbbc.hcmr.gr) of the Hellenic Center for Marine Research (HCMR, www.hcmr.gr), LWG includes 49 partner institutions covering a wide range of scientific disciplines (terrestrial, marine and freshwater biology, zoology, botany, geography, forestry, agriculture, genetics, biotechnology, pharmacy, aquaculture, education and law).
LifeWatchGreece
Thank you ;)