Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching...

Post on 13-Dec-2015

213 views 0 download

Tags:

Transcript of Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching...

Exploring ‘Workspaces’

Tom Visser, SARA compute and networking services, Amsterdam

Garching Workshop 21st September 2010

• Background• Overview of cases• Technical possibilities• Opportunities and risks • Expected results• Proposed approach

The CLARIN-NL connection

• Seeking to create an infrastructure for language resources

• Providing access to tools and technologies• CLARIN-NL and BiG Grid are exploring possibilities• The WHOLE pipeline

– Creating– Curation– Collecting– DO SCIENCE– Depositing

Already

• SARA has developed a client implementation of a Persistent Identifier Service (HANDLE) and has become an EPIC consortium member

• Instance of service currently hosted at SARA• BiG Grid / SURFNET pilot with Short lived

credential service• Activities with Computational Linguistics (e.g.

Named Entity Recognition) & forthcoming Computational Humanities institute (KNAW)

• Series of workshop to find a common ground between BiG Grid and the CLARIN infrastructure

Questions of today

• When is a user workspace service?• Why do we need user workspaces?• What are their characteristics in a distributed

environment?• How do we support processing chains in

distributed environments driven by community environments

• Are there generic frameworks for the execution of distributed processing chains and deployment of web-services

Core problems

• Where to store • How to store• How to access• How to foster collaboration amongst people• How to support: Data discovery, exploration and

exploitation• How to realize such a service• What SLA / service description / responsibilities

What it should be• A temporary storage place (days, weeks, years)

– Global home / global scratch– A ‘logical mount point’

• Accessible by web services• Meaningfully accessible by a human• Autonomy to communities

– Instantiate– Content– Control

• Identifiable• Store digital objects and metadata• Journaling (register interactions)

• Create• Read• Write• Update• Grant access to (Authorization)• List contents• Search contents

– Adopting & offering known best practices and services in the ecosystem

• …

Considered technical possibilities

• iRODS• Cloud platform (SNIA/CDMI)• HADOOP implementation• AMAZON S3 / OpenCloud / Azure /

Risks and opportunities

• Creating something that is only generic - specific• Looking uphill, but what will you know when

you’ve climbed the hill• Knowledge of the community• Epistemological problems• Bootstrapping• Trust

• Proces focus: we are starting a small scale pilot within 1 month, short iterations, keeping everyone involved.

Approach: BiG Grid and Dutch partners

• Many interesting addressable cases– Keyword extraction from dutch audio and film institute– MPI video repository annotations– City of Den Haag government proceedings: minutes and

video alignment (feature extraction)– OCR & Machine learning on dutch handwritings

• Expected results– Common understanding of a workspace service– Bootstrap implementation vertically crossing all layers

• When is a user workspace service?– When it is used and has become an indispensible tool

• Why do we need user workspaces?– To be able to flexibly work with data– Initiate collaborations– Have a trustable storage resource availble

• What are their characteristics in a distributed environment?– Clear core functionality, many service providers, integration

with identity providers • How do we support processing chains in distributed

environments driven by community environments– By having open, known, and easily accessible services

• Are there generic frameworks for the execution of distributed processing chains and deployment of web-services– Yes!

THANK YOU