Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.
-
Upload
curtis-leeke -
Category
Documents
-
view
213 -
download
0
Transcript of Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.
e-Science
“e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it”
John Taylor
Former Director General of the UK Research Councils
Research Challenge
Understanding the brain is the greatest informatics challenge
• Enormous implications for science:
• Medicine
• Biology
• Computer Science
Collecting the Evidence
100,000 neuroscientists generate huge quantities of data
– molecular (genomic/proteomic)
– neurophysiological (time-series activity)– anatomical (spatial)– behavioural
Neuroinformatics Problems
• Data is:• expensive to collect but rarely shared• in proprietary formats & locally described
• The result is:• a shortage of analysis techniques that can be applied
across neuronal systems• limited interaction between research centres with
complementary expertise
Data in Science
Bowker’s “Standard Scientific Model”
1. Collect data
2. Publish papers
3. Gradually loose the original data
The New Knowledge Economy & Science & Technology Policy, G.C. Bowker
Problems:– papers often draw conclusions from data that is
not published– inability to replicate experiments– data cannot be re-used
Codes in Science
Three stages for codes
1. Write code and apply to data
2. Publish papers
3. Gradually loose the original codes
Problems:
– papers often draw conclusions from codes that are not published
– inability to replicate experiments
– codes cannot be re-used
CARMEN
enables sharing and collaborative exploitation of data, analysis code and expertise that are not physically collocated
CARMEN Project
UK EPRSC e-Science Pilot
£5M (2006-10)
20 Investigators
Stirling
St. Andrews
Newcastle
York
Sheffield
Cambridge
ImperialPlymouth
Warwick
Leicester
Manchester
Newcastle: Colin Ingram Paul Watson Stuart Baker Marcus Kaiser Phil Lord Evelyne Sernagor Tom Smulders Miles Whittington
York: Jim Austin Tom Jackson
Stirling: Leslie Smith Plymouth: Roman Borisyuk
Cambridge: Stephen Eglen
Warwick: Jianfeng Feng
Sheffield: Kevin Gurney Paul Overton
Manchester: Stefano Panzeri
Leicester: Rodrigio Quian Quiroga
Imperial: Simon Schultz
St. Andrews: Anne Smith
CARMEN Consortium
cracking the neural code
neurone 1
neurone 2
neurone 3
raw voltage signal data typically collected using single or multi-electrode array recording
Focus on Neural Activity
Epilepsy ExemplarData analysis guides surgeon removing brain tissue
WARNING!
The next 2 Slides show an exposed brain
Epilepsy Exemplar
Recording from removed tissue (up to 20 GB/h)
On-line analysis by distributed collaborators will enable experiment to be defined during data collection
Repository will enable integration of rare
case types from different labs
Advances in Treatment
Data analysis guides surgeon removing brain tissue
e-Science Requirements Summary
• Sharing– data– code
• Capacity– vast data storage
• (100TB+ in CARMEN)
– support data intensive analysis
CARMEN Cloud Architecture
Data storage
and
analysis
User access over Internet(typically via
browser)
Users upload data & services
Users run analyses
e-Science Cloud Services
• Amazon (& Google) offer cloud computing– Basic storage & compute services– e.g. Amazon S3 & EC2
• e-Science needs a set of higher-level services to support user needs
• Which services? ....
CARMEN Cloud (CAIRN)
Data
Metadata
Compute Cluster on which Services are Dynamically
Deployed
WebPortal
..............
WebPortal
Rich Clients
Sec
urity
Workflow Enactment
Engine
RegistryServiceRepos-
itory
Search for Data & Analysis Code
Raw & Derived Data Store
Structured Metadata Store Enabling Search & Annotation
AnalysisCode Store
Dynasoar
• Code Repository and Deployment– long term storage
• Code factored as Web Services– Standard (WS-I) interface– Internals not important
• Java, MatLab, C, C#,C++,...
• Deployers for a variety of service types– .war files (Tomcat), Virtual Machines (VMWare, Virtual
PC), .NET assemblies, database stored procedures
Dynasoar: Dynamic Deployment
21
C WSP
req
res
1
Host Provider
node 1s2, s5
…
node 2
node ns2
Web Service Provider
3
2: service fetch &deploy
SR
Service Repository
R
The deployed service remains in place andcan be re-used - unlike job scheduling
A request to s4
Dynasoar
22
C WSP
req
res
Host Provider
node 1s2, s5
…
node 2
node ns2
Web Service Provider
Consumer
A request for s2 is routed to an existing
deployment of the service
Scalability
0
50
100
150
200
250
300
350
400
450
0.03
0.03
0.03
0.06
0.06
0.13
0.13
0.13
0.25
0.25 0.
5
0.5
0.5 1 1 1
Arrival Rate (messages per second)
Res
pons
e tim
e (s
econ
ds)
0
2
4
6
8
10
12
14
16
18
Proc
esso
rs in
poo
l
Response time(Seconds)
processors in pool
CARMEN Cloud (CAIRN)
Data
Metadata
Compute Cluster on which Services are Dynamically
Deployed
WebPortal
..............
WebPortal
Rich Clients
Sec
urity
Workflow Enactment
Engine
RegistryServiceRepos-
itory
Search for Data & Analysis Code
Raw Signal Data Search & Visualisation
Enactment of scientific analysis processes
Raw & Derived Data Store
Security Policies Controlling Access to Data & Code
Structured Metadata Store Enabling Search & Annotation
AnalysisCode Store
Controlled Sharing
My collaborators can now see it
Everyone can see it
Only I am allowed to see
this data
Scientist
Security Solution
• XACML – standard way to encode rules as (subject, action, resource) triples
• Rules checked on each access
Controlled Sharing - conflicts
My collaborators can now see it
Only I am allowed to see
this data
All data must be accessible to everyone
after the end of the project
Scientist
Funder
Addressing Conflicts
• Each party expresses policy as XACML rules• Rules are converted to formal language
– XACML -> VDM++• Run formal model to detect conflicts
Data
Metadata
Compute Cluster on which Services are Dynamically
Deployed
WebPortal
..............
WebPortal
Rich Clients
Sec
urity
Workflow Enactment
Engine
RegistryServiceRepos-
itory
OMII:Grimoire
DAME:Signal Data Explorer
OMII/ myGrid:Taverna
OGSA-DAI, SRB, DAME
Gold:Role & Task based Security
myGrid & CISBAN
Dynasoar
CARMEN CAIRN
Using CARMEN for a typical scenario
1. Data Collection from a Multi-Electrode Array2. Data Visualisation and Exploration3. Spike Detection4. Spike Sorting5. Analysis6. Visualisation of Analysis Results
Currently, this is asemi-manual process
CARMEN has automated this….
SRB FileSystem
RDBMS
External
Client Spike Sorting
Service
Reporting
Dynamically Deployed Services in Dynasoar
TAVERNA
Registry
INPUT Data
OUTPUT Metadata
Available Services
RepositoryS
ecur
ityWorkflow Engine
Query
Running the Workflow
CARMEN (www.carmen.org.uk)
• is delivering an e-Science infrastructure that can be applied across a diverse range of applications
• uses a Cloud/Software as a Service architecture • enables cooperation and interdisciplinary working• aims to deliver new results in neuroscience, computer
science and medicine