Gridforum David De Roure Newe Science 20080402

57
Eindhoven Edition

description

Gridforum.nl Annual Business Day 2008

Transcript of Gridforum David De Roure Newe Science 20080402

Page 1: Gridforum David De Roure Newe Science 20080402

Eindhoven Edition

Page 2: Gridforum David De Roure Newe Science 20080402

Due to the complexity of the software and the backend infrastructural requirements, e-Science projects usually involve large teams managed and developed by research laboratories, large universities or governments.

e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.

Page 3: Gridforum David De Roure Newe Science 20080402

How do we know when e-Science has succeeded?

Not just accelerated but new

A. When everyone is using the Grid

B. When there are routine scientific advances that would not have happened otherwise

Page 4: Gridforum David De Roure Newe Science 20080402

How do we move from heroic scientists doing heroic science with heroic infrastructure to everyday scientists doing science they couldn’t do before?humanists

archaeologistsgeographersmusicologists...researchers!

research

It’s the democratisation of e-Research

Page 5: Gridforum David De Roure Newe Science 20080402

scientists

LocalWeb

Repositories

Digital Libraries

Graduate Students

Undergraduate Students

Virtual Learning Environment

Technical Reports

Reprints

Peer-Reviewed Journal &

Conference Papers

Preprints &

Metadata

Certified Experimental

Results & Analyses

experimentation

Data, Metadata Provenance WorkflowsOntologies

The social process of science

Page 6: Gridforum David De Roure Newe Science 20080402

Between 19th October and23rd November 2007

I attended sixinternational meetings

related to e-Science

Grid 2007Scientific and Scholarly Workflows

e-Social Science 2007W3C

Open Grid ForumMicrosoft e-Science

This is what I found

Page 7: Gridforum David De Roure Newe Science 20080402

Everyday researchers doing everyday research

Everyday researchers doing everyday research

• Not just a specialist few doing heroic science with heroic infrastructure

• Chemists are blogging the lab• Everyone is mashing up• Everday hardware – multicore

machines and mobile devices

11

Page 8: Gridforum David De Roure Newe Science 20080402

A data-centric perspective, like researchers

A data-centric perspective, like researchers

• Data is large, rich, complex and real-time

• There is new value in data, through new digital artefacts and through metadata e.g. context, provenance, workflows

• This isn’t “anti-computation” –design interaction around data

22

Page 9: Gridforum David De Roure Newe Science 20080402

Collaborative and participatoryCollaborative and participatory

• The social process of science revisited in the digital age

• Collaborative tools – blogsand Wikis

• e-Science now focuseson publishing as well as consuming

• Scholarly lifecycle perspective

33

Page 10: Gridforum David De Roure Newe Science 20080402

Benefitting from the scale of digital science activity to support science

Benefitting from the scale of digital science activity to support science

• This is new and powerful!• Community intelligence• Review• Usage informing

recommendation• e.g. OpenWetWare• e.g. myExperiment

44

Page 11: Gridforum David De Roure Newe Science 20080402

Increasingly openIncreasingly open

• Preprints servers and institutional repositories

• Open journals• Open access to data• Science Commons• Object Reuse & Exchange

55

Page 12: Gridforum David De Roure Newe Science 20080402

Better not PerfectBetter not Perfect

• The technologies people are using are not perfect

• They are better• They are easy to use• They are chosen by

scientists

66

Page 13: Gridforum David De Roure Newe Science 20080402

Empowering researchersEmpowering researchers

• The success stories come from the researchers who have learned to use ICT

• Domain ICT experts are delivering the solutions

• Anything that takes away autonomy will be resisted

77

Page 14: Gridforum David De Roure Newe Science 20080402

About pervasive computingAbout pervasive computing

• e-Science is about the intersection of the digital and physical worlds

• Sensor networks• Mobile handheld

devices

88

Page 15: Gridforum David De Roure Newe Science 20080402

1. Everyday researchers doing everyday research2. A data-centric perspective, like researchers3. Collaborative and participatory4. Benefitting from the scale of digital science

activity to support science 5. Increasingly open6. Better not Perfect7. Empowering researchers8. About pervasive computing

Signs of the TimesSigns of the Times

Page 16: Gridforum David De Roure Newe Science 20080402

• e-Science is now enabling researchers to do some completely new stuff!

• As the individual pieces become easy to use, researchers can bring them together in new ways and ask new questions

• “The next level”

Onward and UpwardOnward and Upward

“Standing on theshoulders of giants”

www.w3.org/2007/Talks/www2007-AnsweringScientificQuestions-Ruttenberg.pdf

(Everyday researchers are giants too)

Page 17: Gridforum David De Roure Newe Science 20080402

Note to Reader. The next slides are not intended to be anti-grid. Everyone working on Grid is doing great work.

Page 18: Gridforum David De Roure Newe Science 20080402

• Everyday researchers doing everyday researchBUT heroic Grid infrastructure not being adopted

• A data-centric perspective, like researchersBUT Grid gives APIs to computation not data

• Collaborative and participatoryBUT Grid has deeply rooted service provider mindset

• Better not PerfectBUT Grid aims to provide well-engineered perfect solution

• Giving autonomy to researchersBUT Grid has feel of institutional control (at this time)

• About pervasive computingBUT Grid is about portals, not the next generation of users

The Grid ProblemThe Grid Problem

Page 19: Gridforum David De Roure Newe Science 20080402

e-ScienceTechnologyCreators& Integrators

ApplicationsResearch

EEResearch

Socio-economic&CommercialInnovation

e-Sciencebespoketailoring

MassUse byResearchers

5 years 5 years 5 years

CSResearch

e-Science

10s ofintegrators

100s ofembeddedconsultants

1000s ofresearch

users

The Arrow ProblemThe Arrow Problem e-Science Pipeline

Malcolm Atkinson

NB This isn’t wrong!

Page 20: Gridforum David De Roure Newe Science 20080402

Don’t think rollout of technologies...

Think roll-in of researchers...

MassUse byResearchers

MassUse byResearchers

Knowledge co-production vs Service Delivery!

Page 21: Gridforum David De Roure Newe Science 20080402

Web Services RESTful APIs cmd lines ssh http

Web Browser Mobile phone iPod Car Equipment PDA

P2P

mashups

workflows

services

applicationsSubjectICT experts Computer

Scientists

Software Companies

Workflowtools

Ruby on Rails

ecosystem

Scientists

open sourceSoftwareEngineers

nescOeRC

Page 22: Gridforum David De Roure Newe Science 20080402

• It’s about empowerment as well as provision• People power – the new instrument of scale!• Hence usability:

– Simple/familiar interfaces for users– Simple/familiar interfaces for developers– No need for a summer school!

• Step into user space and look back• Computer Scientists as facilitators and

problem solvers(?)

For a flourishing ecosystem...For a flourishing ecosystem...

Page 23: Gridforum David De Roure Newe Science 20080402

• Wikis• Mashups• REST APIs• Google Maps• Technologies:

– AJAX, JSON, Ruby on Rails, ...

• Social networking• Web as a distributed application platform

– Amazon S3 and EC2

But what about Web 2.0?!But what about Web 2.0?!

Page 24: Gridforum David De Roure Newe Science 20080402

Signs of the TimesSigns of the TimesThe Long Tail

Data is the Next Intel Inside

Users add value

Network effects by default

Some Rights Reserved

The Perpetual BetaCooperate, don’t ControlSoftware above the level of the single device

Web 2.0 patternsWeb 2.0 patterns

www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

1. Everyday researchers doing everyday research

2. A data-centric perspective, like researchers

3. Collaborative and participatory4. Benefitting from the scale of

digital science activity5. Increasingly open

6. Better not Perfect

7. Empowering researchers

8. About pervasive computing

Page 25: Gridforum David De Roure Newe Science 20080402
Page 26: Gridforum David De Roure Newe Science 20080402

use Web 2.0 here?

Grid

Page 27: Gridforum David De Roure Newe Science 20080402

use Web 2.0

here?

Grid

Page 28: Gridforum David De Roure Newe Science 20080402

Grid

use Web 2.0 here

Gridcloud HPC

Page 29: Gridforum David De Roure Newe Science 20080402

A utility is a directly and immediately useable service with established functionality, performance and dependability, illustrating the emphasis on user needs and issues such as trust

Services are knowledge-assisted (‘semantic’) to facilitate automation and advanced functionality, the knowledge aspect reinforced by the emphasis on delivering high level services to the user

The architecture comprises services which may be instantiated and assembled dynamically, hence the structure, behaviour and location of software is changing at run-time

Service-Oriented Knowledge UtilityService-Oriented Knowledge Utility

semanticgrid.org/NGG3

Page 30: Gridforum David De Roure Newe Science 20080402

If you peel back the label and its says “Grid” or “OGSA” underneath… its not a cloud. If you need to send a 40 page requirements document to the vendor then… it is not cloud.If you can’t buy it on your personal credit card… it is not a cloudIf they are trying to sell you hardware… its not a cloud.If there is no API… its not a cloud.If you need to rearchitect your systems for it… Its not a cloud.If it takes more than ten minutes to provision… its not a cloud.If you can’t deprovision in less than ten minutes… its not a cloud.If you know where the machines are… its not a cloud. If there is a consultant in the room… its not a cloud.If you need to specify the number of machines you want upfront… its not a cloud.If it only runs one operating system… its not a cloud.If you can’t connect to it from your own machine… its not a cloud.If you need to install software to use it… its not a cloud.If you own all the hardware… its not a cloud.

James Governor

Page 31: Gridforum David De Roure Newe Science 20080402

Multicore chips will offer so much performance that we need not cobble together heterogeneous resources but rather can deploy simple powerful systems

Geo

ffrey

Fox

Page 32: Gridforum David De Roure Newe Science 20080402

• Web 2.0 is not high performance– It improves the performance of science and people!

• Web 2.0 is not a properly engineered solution– Scientists want better, not perfect. And agility.

• Web 2.0 is not secure– People do lots of “secure” things on the Web

• Web 2.0 is a fad that will pass– It’s inevitable and it’s already happened!

• Web 2.0 works for teenagers but it won’t for scientists– See OpenWetWare

• Web 2.0 lets the oiks in and this is a bad thing– Now we can do peer review even better!

MythsMyths

Page 33: Gridforum David De Roure Newe Science 20080402

N2

N

N

Page 34: Gridforum David De Roure Newe Science 20080402

One MiddlewareOne Middleware2N

N

N

Page 35: Gridforum David De Roure Newe Science 20080402

MiddlewareMiddleware?

N

N

MiddlewareMiddleware

MiddlewareMiddleware

MiddlewareMiddleware

MiddlewareMiddleware

MiddlewareMiddlewarePolynomial involving N1,N2 and M

Page 36: Gridforum David De Roure Newe Science 20080402

www.myexperiment.org

Page 37: Gridforum David De Roure Newe Science 20080402

Workflows are the new rock and roll

Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources

The era of Service Oriented Applications

Repetitive and mundane boring stuff made easier

E. Science laboris E. Science laboris

Carole Goble

Page 38: Gridforum David De Roure Newe Science 20080402

Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle

Paul meets Jo. Jo is investigating Whipworm in mouse.

Jo reuses one of Paul’s workflow without change.

Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite.

Previously a manual two year study by Jo had failed to do this.

Recycling, Reuse, RepurposingRecycling, Reuse, Repurposing

Page 39: Gridforum David De Roure Newe Science 20080402

20072006200520042003

40

Taverna downloads per day

Taverna downloads per day

taverna.sourceforge.net

Page 40: Gridforum David De Roure Newe Science 20080402

• Run on your laptop – no sysadmin required

• Access independent third party world-wide service providers of applications, tools and datasets– 850 databases, 166 web

servers Nucleic Acids Research Jan 2006

• My local applications, tools and datasets. In the Enterprise. In the laboratory.

• Easily incorporate new services without coding

The SuperclientThe Superclient

Page 41: Gridforum David De Roure Newe Science 20080402

Kepler

Triana

BPEL

Ptolemy II

Page 42: Gridforum David De Roure Newe Science 20080402

myExperiment.org is… “Facebook for Scientists”...but

different to Facebook! A community social network. A gateway to other publishing

environments A federated repository A platform for launching

workflows Publishing self-describing

Encapsulated myExperiment Objects

Mindful publication Started March 2007 Closed beta since July 2007 Open beta November 2007

myExperiment.org is...myExperiment.org is...

Page 43: Gridforum David De Roure Newe Science 20080402
Page 44: Gridforum David De Roure Newe Science 20080402

Google GadgetGoogle Gadget

Page 45: Gridforum David De Roure Newe Science 20080402

Ownership and AttributionOwnership and Attribution

Page 46: Gridforum David De Roure Newe Science 20080402

24/5/2007 | myExperiment | Slide 46

Page 47: Gridforum David De Roure Newe Science 20080402

`

users

descriptions

groups

friendships

tags

Enactor

blobsworkflows

HTMLXML

Snapshot map of resources with their relationships and versions

Page 48: Gridforum David De Roure Newe Science 20080402

scientists

LocalWeb

Repositories

Graduate Students

Undergraduate Students

Virtual Learning Environment

Technical Reports

Reprints

Peer-Reviewed Journal &

Conference Papers

Preprints &

Metadata

Certified Experimental

Results & Analyses

experimentation

Data, Metadata Provenance WorkflowsOntologies

Digital Libraries

The social process of science 2.0

Page 49: Gridforum David De Roure Newe Science 20080402

• e-Research is about doing new research• Grid is just one part of the solution• Users are not just consumers of

infrastructure. Empower them.• Web 2.0 is a set of design patterns• Think Web 2.0 coupling Grid and other

services• Workflows make e-Science easier, and

Web 2 makes workflows easier

Take Homes 2.0Take Homes 2.0

Page 50: Gridforum David De Roure Newe Science 20080402

Contact

David De [email protected]

Carole [email protected]

Thanks

Malcolm Atkinson, Geoffrey Fox,Jeremy Frey, Savas Parastatides,

The myGrid Family

Page 51: Gridforum David De Roure Newe Science 20080402

Provenance

Harvesting

myExperiment metadata bus

ORE

RDFStore

Encapsulated myExperiment Object (EMO)

MetadataMetadata

Page 52: Gridforum David De Roure Newe Science 20080402

ReM=Resource Map, A=aggregation, AR=Aggregated Resourcehttp://www.openarchives.org/ore/0.1/datamodel-overview

OAI-ORE Object Exchange and Reuse

Page 53: Gridforum David De Roure Newe Science 20080402

Anatomy of an EMO

EMO Metadatacreator, modified, rights

URIs into myExperiment(s) with types and comments workflow, data, description

URIs to external resources, with alternates, types, comments, versions

Optional annotations of URIs and their relationships

Page 54: Gridforum David De Roure Newe Science 20080402

Linked Data

Page 55: Gridforum David De Roure Newe Science 20080402

TAVERNA FUNCTIONALLANGUAGE SHOCK!

RESEARCH

DAILY

British Scientists revealed today that Taverna is in fact a functional language. In a police statement, Taverna creator Tom Oinn said “it’s a fair cop guv”...

Advertisement

New Improved

Closurize and Concentrate TM

Add Lambda Calculus to your Lambda Network! Satisfaction guaranteed in

several different colours

Page 56: Gridforum David De Roure Newe Science 20080402

Original workflow

High-leveldesign ofquality filter

Compilationto quality workflow

Compilationto quality workflow

IntegrationIntegration

New quality filter

Quality-awareworkflow

Declarative specification

• Declarative spec is formal (XML)• Compilation is automated• QW follows predictable pattern

integration also automated

• Declarative spec is formal (XML)• Compilation is automated• QW follows predictable pattern

integration also automated

Quality WorkflowsQuality Workflows

Paolo Missier

Page 57: Gridforum David De Roure Newe Science 20080402

Malcolm Atkinson