From construction to deployment of LifeWatchGreece the potentail role of EGI-LW Competence Centre

17
1 4/23/2015 by Emmanouela Panteri Contributors: Christos Arvanitidis, Nicolas Bailly, Sarah Faulwetter,, Jacques Lagnel, George Perantinos, Anastasis Oulas, Panagiotis Vavilis, Kleoniki Keklikoglou, Matina Nikolopoulou, Alexandros Gougousis and about 30 data managers From construction to deployment of LifeWatchGreece: The potential role of EGI - LW Competence Centre

Transcript of From construction to deployment of LifeWatchGreece the potentail role of EGI-LW Competence Centre

14/23/2015

by Emmanouela Panteri

Contributors: Christos Arvanitidis, Nicolas Bailly, Sarah Faulwetter,, Jacques Lagnel, George Perantinos, Anastasis Oulas, Panagiotis Vavilis, Kleoniki Keklikoglou, Matina Nikolopoulou, Alexandros Gougousis and about 30 data managers

From construction to deployment of LifeWatchGreece: The potential role of

EGI - LW Competence Centre

24/23/2015

LifeWatchGreece e-Infrastructure

Insert footer here

LifeWatchGreece

LWG e-infrastructure:● Multi-server e-infrastructure

currently deployed in HCMR, Crete● Hosts biodiversity data and

applications Applications:● e-services: searching datasets/ data

or one-shot analyses● vLabs: interfaces for advanced

selection of datasets/data, and more elaborated suites of analyses

series of web tools (vLabs or e-services) for the public

34/23/2015

Application development in 2 steps:● Independent development of a web

application (by the team)● Integration to the infrastructure / portal

Access Control ● Landing page: list of applications● One-time sign-up for accessing all apps● A few applications require more

credentials: the computer-intensive ones

● User Rights management Graphical Interface ● A common graphical interface

frame/wrapper introducing all applications

Accessed by the LifeWatchGreece Portal

LifeWatchGreece

portal.lifewatchgreece.eu

44/23/2015

LWG e-Infrastructure: advantages

LifeWatchGreece

● Applications developed in any programming language (PHP, Java EE, .NET, ...)

● Design, development and maintenance of applications independent from each other: a common standard only for data exchange (DwC, …)

● Each application run in independent execution environment: scalable VMs number if needed with more apps.

● Compartmented security: affected application does not compromise others

● Core developers involved only at integration stage

LifeWatchGreece

54/23/2015

LWG e-Infrastructure: advantages

LifeWatchGreece

● Other integration methods: iframes, integrating graphically commercial apps

● Open source applications integration possible with few adaptations both at access level and graphical level, especially when under MVC architecture

● Moreover, most CMSes can be easily integrated, at least at the access control level

● Certain javascript and CSS frameworks provided by default through libraries in order to enforce the consistency of the user interface throughout the portal

LifeWatchGreece

64/23/2015

LWG Portal diagram

LifeWatchGreece

LWG: Application Layer, Data Layer, Cluster, Communication

74/23/2015

LWG e-Infrastructure: What is missing

LifeWatchGreece

● No user workspace● Currently, files not retrievable from one session to the other, from

one tool to the other.● Could EGI Competence Center provide such functionality?

Workspace development will increase significantly the storage requirements.

● Would require some work between LWG infrastructure and EGI-CC (e.g., space allocation after sign up)

LifeWatchGreece

84/23/2015

Mainly focused on OMICs NGS data analysis:● Transcriptomics (RNA-Seq)● Genomics (Eukaryote and bacterial)● Metagenomics (microbial community)● Metabarcoding● RAD-Sequencing

More than 170 bioinformatics packages covering:● Genomes & transcriptomes de novo assembly● Functional and structural genes annotation● Sequence similarity (parallel BLAST) and mapping● Population genetics● Phylogeny reconstruction● Statistics (250 R packages installed)● Genetic markers mining/analysis

HPC bioinformatics platform

LifeWatchGreece

43 users from 11 institutes in 5 countries (Greece, Italy, France, Norway, Portugal)

More than 8000 jobs submitted during the last month

94/23/2015

● 9 worker nodes● 108 cores, ● 784 GB RAM, ● 30TB storage● 10 Gbps ethernet network

● Gentoo Linux● Resource Manager: Torque/Maui● storage: XFS/NFS● storage users quota

HPC bioinformatics platform upgrade

LifeWatchGreece

● 13 worker nodes● 300 CPU cores● 2.5 TB RAM● 120 TB storage● 40 Gbps Infiniband network

● Centos linux/debian● Resource Manager: SLURM● Storage: Lustre and ZFS/NFS● Storage group/users quota● LXC Virtualization● User management via LDAP

~3x Performance

Software (open source)

Hardware

Languages: GCC, ICC/IFC, R, BioPerl Biopython, ruby, Biojava....parallelization: openMPI, OpenMP and pthreadsDatabase servers: MySQL, PostgreSQL, ...

104/23/2015

Bioinformatic challenges

LifeWatchGreece

RNA-Seq data analysis =>360 MreadsOptimised and parallelised pipeline

Sequence similarity search: parallel BLAST=>10,000 queries

Runtime on the biocluster (h)

Runtime on a PC (1 CPU)

Assemblyrequires~350GB shared RAM

12 (10 threads) >120 h

Annotation

BLAST 96 (94 jobs) >> 3 month

InterPro 32 (48 jobs) 1.5 month

Mapping 4 (12*10 threads) >10 days

Total ~ 6 days >5 months

Nb CPUs 1 94

blastn (nt)

Speedup / Runtime (h)

1.0 / 6.1 days

105.4 / 1.4 h

blastx (nr)

Speedup / Runtime (h)

1.0 / 11.6 days

88.8 / 3.1 h

114/23/2015

eServices and vLabs: the R-vLab

LifeWatchGreece

How can EGI Competence Center help LWG e-infrastructure to increase its

computational power?

● Uses the “R” programming language

● Supports an integrated and optimized online R environment (data manipulation and computational speed-up)

● Allows to overcome severe computational power deficit, e.g.: Calculation on large matrices of several biodiversity indices and of multivariate analyses

124/23/2015

eServices and vLabs: the R-vLab

LifeWatchGreece

~20 fold speed-up

Conventional Mantel compared to Parallel Mantel

134/23/2015

● Micro-computed tomography ● Non-destructive method of 3D x-ray

microscopy● Creation of 3D models of objects

from a series of x-ray projection images

MicroCT offers:● Collection of virtual galleries

of taxa displayed and disseminated● Manipulation of the 3D models through

a series of online tools● Download of datasets for local

manipulations

eServices and vLabs: MicroCT

LifeWatchGreece

How can EGI Competence Center help LWG e-infrastructure for the storage and image manipulation, incl. 3D models?

144/23/2015

MicroCT: current issues

In general:● Potential large increase of the number image galleries especially

from museum specimen collections (several orders of magnitude)● Need for 3D metadata standards: dissemination and searching● Need for 3D data annotations protocols and tools● Need for searching tools over the spread catalogues of galleries

(centralized or distributed)In LWG● MicroCT generates many image files: storage issue● Processing and manipulating images are CPU intensive: computing

issue

LifeWatchGreece

154/23/2015

Harvesting various other repositories such as:● Taxonomic: CoL and PESI (and components: FADA, EMRS,

E+MP), WoRMS, EEA/EUNIS, ...● Occurrences: GBIF, OBIS, ...● Species traits: PolyTraits, FishBase, SeaLifeBase, eModNet, ...● Bibliography: RefBank, BHL, AnimalBase, ...● Citizen Science: iNaturalist, ...● Workflows: BioVel, ...

Install mirror websites: FishBase, RefBank, GNI

Develop Web Services for disseminating LWG data:

● Concerns about performance due to Web services use

LifeWatchGreece

LWG and EGI Competence Center

Processing power and storage requirements

164/23/2015

Linked Data / Linking Open Data

LifeWatchGreeceLifeWatchGreece

LifeWatchGreece principle: make data available to everybody

A number of datasets as RDF under triplestores are ready

Diagram from http://lod-cloud.net/

174/23/2015

LifeWatchGreece

LifeWatchGreece Research Infrastructure , funded by the GSRT (Greek government: structural funds), is the national effort to address the above requirement and to support relevant studies.To materialize its aim, LWG RI adheres to the central lifewatch.eu guidelines, and attempts to ally all the Greek scientific human resources working on biodiversity data and data observatories.Coordinated by the Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC, www.imbbc.hcmr.gr) of the Hellenic Center for Marine Research (HCMR, www.hcmr.gr), LWG includes 49 partner institutions covering a wide range of scientific disciplines (terrestrial, marine and freshwater biology, zoology, botany, geography, forestry, agriculture, genetics, biotechnology, pharmacy, aquaculture, education and law).

LifeWatchGreece

Thank you ;)