The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork...
-
Upload
kellie-bailey -
Category
Documents
-
view
221 -
download
0
Transcript of The European Nutrigenomics Organisation Un Oslo Un Munich Un Florence Un Balearic Illes Un Cork...
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGOUn Oslo
Un Munich
Un Florence
Un Balearic Illes
Un Cork
Trinity
Un. Ulster
Rowett
Un Newcastle
Un Reading
IFR DiFE
Un Krakow
Inserm Marseille
TNO
Un Wageningen
Un Maastricht
EBI
NuGO
Un Lund
RikiltRivm
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
The NuGO Black Box project:
A distributed bioinformatics infrastructure
for nutrigenomics research
Tony Travis
University of Aberdeen
Rowett Institute of Nutrition and Health
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO NuGO is a virtual organisation
• Why?
• Management of research projects spans institutional boundaries
• Avoids duplication of effort, and share resources to solve problems effectively
• How?
• Scientists working in different research labs collaborate and share their data
• Labs develop trust relationships, and share intellectual property rights
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Why is data sharing important?
• It's evidence that a trust relationship exists
• Help to reconcile conflicts of interest– Take measures to restrict access to data– Avoid accidental 'prior-disclosure'– Prevent unauthorised or inappropriate use– Support potential patent applications– Permit correct attribution of scientific work
• Lack of trust - disincentive to data sharing
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO NuGO is virtually organised
• Strengths
• Apply the aggregate resources of many partners to problem
• e.g. PPS (Proof of Principle Study)
• Free exchange of ideas within NuGO
• Weaknesses
• Trust relationships are quite fragile...
• Conflicts due to 'prior disclosure' of unpublished data
• Unfair attribution of work accomplished
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Utopian view
Property is theft… Share data
freely Everyone
benefits Ideas develop Science
prospers
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Not everyone agrees!
Big pharma make a profit by exploiting academic science
ISV's promote proprietary software
Knowledge is power...
Freedom is a threat
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Reconciliation...
• IPR– Intellectual property
rights are important
• Freedom– Intellectual freedom
is important
• Attribution– Supports IPR– Defends freedom
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Data management
• Single most difficult problem for science– No simple solution to 'schema' integration– Data centres are appropriate for business– Business methods are not appropriate for
science...
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Business computing methods
• Bottom line– Minimise the cost of ICT infrastructure– Centralise resources– Maximise profit
• Rigid and inflexible ICT policies– Reduce costs– Use industry 'standards'– Avoid expensive 'non-standard' solutions
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Scientific computing methods
• Intellectual freedom– Maximise the benefit of ICT to scientists– Collaborative development of software– Freedom to innovate
• Flexible ICT policies– User administered PC's– Devolution of authority– Well supported and documented
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Data centres are quite good
• The problem is business methods– Profit, not the 'customer' is the priority
• A well-managed data-centre is a good place to store your data!
• Users don't need to worry about backups and disaster recovery
• Science is sometimes underfunded, so economies of scale can be important
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Let's compromise...
• Size matters– A large, remote data centre is too big– A typical laptop/desktop is too small
• The solution should be scaled appropriately– Our unit of collaboration is the lab– Let's say five of six people in a lab– Everyone has their own PC
• We need a 'lab-scale' solution
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO NBX strategy
• Server-grade PC– Designed to be running 24/7/365– Resilient to hardware failure– Powerful enough for five or six people
• Use all available resources– The NBX is a web server 'appliance'– The lab PC's are clients that use the NBX– The NBX does most of the work– The client PC's display the results
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO IT policy at NuGO partners
• Limited access requested– Port 22 (SSH) and 80 (HTTP) open
– Tunnel insecure protocols via SSH
•Client PC requirements modest– Java enabled web browser
– Optional installation of Windows clients
•Remote admin of NBX's by NuGO
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGOWhy a ‘black-box’ approach?
Don’t need to know how it works to use itDeploy a pre-configured Linux server easily
– Install from ‘live’ DVD on existing hardware
– Pre-installed on systems supplied by NuGO
– Reduce need for IT support in every lab
– Automatic backup and software updatesAutonomous system
– able to discover peer NBX systems
– cooperate with peers to share workload
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO NuGO Black Box (NBX)
• lab-scale server– Based on Bio-Linux– NERC/NEBC
• Web-appliance– Web browser– Web services
• NBX network– Bioinformatics
infrastructure
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO NuGO data sharing network
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO NBX roll-out to NuGO partners
• Limited access – Port 22 (SSH) and
80 (HTTP) open
•Client PC– Web browser
– Optional clients
•NBX Admin– Local “manager”– Remote “nugo”
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGONuGO-Grid
•Network of NuGO-Linux servers– Interconnected to create Grid
•Compute Grid– Load-balance between servers
•Data Grid– Pool data and share resources
• P2P (Peer-to-peer) – Local control of resource sharing
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Current status of NuGO-Grid
NuNuGOGONuNuGOGO Kerrighed
• Active EU FP6 project• Funded until 2010• http://www.kerrighed.org• Uses ideas from openMosix and OpenSSI
NuNuGOGONuNuGOGO Prototype NBX clusters
• Maastricht– Four NuGO NBX's
• RINH– Four NuGO NBX's– Four RINH NBX's– Eight BioSS NBX's
• Collaboration with Mario Negri Institute
• Objectives– Aggregate CPU and
memory of locally connected NBX's
– Incremental upgrade of NBX's instead of NBX replacement
– Adjust resource to scale of problem
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO XtreemOS Grid operating system
• INRIA, Paris
– Kerrighed-based
– SSI kernel patch
– Grid capabilities in
Linux kernel space
– No middleware
– Virtual organisations
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO XtreemOS proof of concept
• NuGO project at RINH and Mario Negri– Evaluate Kerrighed and XtreemOS for NBX– Using Bio-Linux 5.0 version of NuGO-Linux– Prototype seven node Kerrighed cluster
• EasyUbuntuClustering Wiki– http://wiki.ubuntu.com/EasyUbuntuClustering– Community-based collaborative development– Part of the 'biobuntu' blueprint
Harnessing the power ofHarnessing the power ofDisruptive TechnologiesDisruptive Technologies
A PEER-TO-PEER approachA PEER-TO-PEER approachfor data sharing in clinical trials for data sharing in clinical trials
and bioinformatics researchand bioinformatics research
NuNuGOGONuNuGOGO Peer-to-peer data sharing
• Luca Clivio, Mario Negri Institute Milan– p2pDB for clinical data– Case study 1: SINPE-DOMUS (in production)
• Italian Registry of Domiciliar Artificial Nutrition• ~3000 patients enrolled in 60 centres• Each patient visited about 10 times
– Case study 2: (under test)• Italian Gynaecologic Ovarian Cancer Tissue Bank• Three tissue/cell line banks at the Oncology Dept.
NuNuGOGONuNuGOGO SINPE-DOMUS (web model)
CentralWeb server
centre
centre
centre
centre
centre
centre
centre
centre
NuNuGOGONuNuGOGO SINPE-DOMUS (distributed DB)
Centre Centre
Centre
Centre
Centre
Centre Centre
Centre
MARIO NEGRIInstitute
NuNuGOGONuNuGOGO SINPE-DOMUS (p2pDB)
Centre Centre
Centre
Centre
Centre
Centre Centre
Centre
MARIO NEGRIInstitute
NuNuGOGONuNuGOGO
Storage Peer
Storage Peer
High performance Cluster Peer
Partner
Index node A
Index node C
Index node B
Index node D
p2p Network coordination
Partner
Partner
Partner
Push-based p2pDB
NuNuGOGONuNuGOGO Proposed Infrastructure
Arrayexpress
GEO
MicroarrayExperimentsDB (LIMS)
ClinicalTrials DB
BioBankDB
(tissue banks or cell lines)
Microarrayexperiments
Analysisworkflow
Tony Travis (RRI/BioSS)NUGO Black Boxes
(European nutrigenomics)
Output
Duccio Cavalieri(Istituto Toscano Tumori)
Giovanna Chiorino(Fondo Edo Tempia)
NuNuGOGONuNuGOGO Unbalanced networks
• Inevitable in 'omics' research– The huge amount of data involved does not
allow a fully replicated distributed database– Unpublished data can not be shared without
explicit agreement between collaborators– Unverified data should not be shared at all
• p2p data sharing– Designed for unbalanced networks– map/reduce moves computation to data
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO The wrong cloud*
• Amazon Web Services (AWS)– Elastic Compute Cloud (EC2)– Simple Storage Service (S3)
• Cost effective if you seldom use a computer– The more you use a computing and storage infrastructure
the less economic it becomes to rent it from someone else
• Nothing new: Computer bureaux and expensive BIG iron
• Private clouds are the way forward– Maximise use of resources within an organisation
* Peter Lucas, Joseph Ballay, Ralph Lombreglia (MAYA Design, Inc., March 2009).
www.maya.com/file_download/126/The%20Wrong%20Cloud.pdf
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Development of NuGO-Grid
• NBX Data Grid
– Data sharing
– NuGO-Linux
• NBX Compute Grid
– Kerrighed
– XtreemOS
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO NuGO-Linux USB stick
• Installation– Workstation/server
• Rescue– Disaster recovery
• Personal NBX– 'live' USB stick– demo/evaluation
NuGO-Linux DVD 'iso' image at: http://nbx1.nugo.org/biobuntuContact: NuGO communications manager: [email protected]
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Summary
• Viable NBX Data Grid
• Basic NBX Compute Grid
• Bio-Linux 5.0 NBX's being deployed
• XtreemOS NBX proof of concept project
• Collaboration with NEBC and NTC
• Proposed p2bDB infrastructure
• NuGO-Linux USB stick
http://www.nugo.org/nbx
the European Nutrigenomics Organisation
NuNuGOGONuNuGOGO Acknowledgements NBX project
Ulrich Harttig (NIN and NBX repository)
Harrie Kools (NBX installation)
Philippe Rocca-Serra (base2)
Philip de Groot (GenePattern)
Chris Evelo, Martijn van Iersel, Thomas Kelder (Desktop)
Duccio Cavalieri (EuGene and NBX access policy)
Patrick Ahles, Charly John, Olivier Riche (NBX upgrade)
Lars Eissen, Caroline Reiff (NBX help-desk)
Marten Renkema (NuGO-Net NBX pages)
Kerrighed/XtreemOS proof of concept project Luca Clivio, Alicia Mason (Kerrighed and XtreemOS)
Ruan Elliott (WPT coordinator and NBX tester)
p2pDB (IRFMN) project Luca Clivio, Bioinformatics Dept.
Sergio Marchini, Oncology Dept.
Maddalena Fratelli, Biochemistry Dept.
Giovanna Chiorino, Fondo Edo Tempia, Biella
Duccio Cavalieri, Istituto Toscano Tumori, Università di Firenze