Is there life after X.509 ? Security Workshop Globus World 2004
20150126-Globus-USUK-Data-Workshop-Blaiszik-final
-
Upload
ben-blaiszik -
Category
Documents
-
view
14 -
download
0
Transcript of 20150126-Globus-USUK-Data-Workshop-Blaiszik-final
![Page 1: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/1.jpg)
Globus Scientific Data Publication
Services Ben Blaiszik, Kyle Chard, Rachana Anathakrishnan, Steve Tuecke, Ian
Foster, Globus Team
[email protected] www.globus.org
ComputationInstitute
![Page 2: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/2.jpg)
Overview • What is Globus? • Globus Services
– Data publication – Data cataloging – Data transfer – User authentication – Groups – Sharing
2
![Page 3: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/3.jpg)
> 8000 endpoints > 85 U.S. campuses European Globus Community: http://www.egcf.eu/
3
Globus is ...
Research data management delivered via SaaS
![Page 4: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/4.jpg)
Big data transfer, sharing,publication, and discovery…
…directly from your own storage systems OR the cloud
4
Globus Delivers
![Page 5: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/5.jpg)
SaaS Market Domination...
…for your photos
…for your e-mail
…for your entertainment
…for your research data
5
![Page 6: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/6.jpg)
Research data management scenarios and challenges
6
![Page 7: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/7.jpg)
Public Cloud
“I need to easily, quickly, & reliably move or mirror portions of my data to other places.”
Research Computing HPC Cluster Lab Server Personal Laptop
XSEDE Resource
7
Scientific Instrumentation
![Page 8: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/8.jpg)
“I need to easily and securely share my data with my colleagues at other institutions.”
8
![Page 9: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/9.jpg)
“I need to publish my data so that others can find it and use it.”
Scholarly Publication
Reference Dataset
Active Research Collaboration
9
![Page 10: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/10.jpg)
Globus Transfer • “Fire-and-forget” transfers
– Optimize transfer – Automatic fault recovery – Automatic retry – Seamless security integration – 128-bit checksums
• Intuitive Web GUI and powerful APIs for automation – REST and Python APIs
10
B
Globus moves the data for you
secureendpoint,
e.g. laptop
You submit a transfer request Globus
notifies you once the transfer is complete
secureendpoint,e.g. midway
transfer
A
![Page 11: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/11.jpg)
Catalog Data Publication
Endpoint File Systems
Discover Plugin Point [Federation?]
Globus, the Abridged Version
Transfers
Groups Sharing
User Auth
Metadata Layer
Data layer
11 * REST and Python APIs throughout
![Page 12: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/12.jpg)
Globus Catalog • Automate metadata ingestion from
instrumentation and acquisition machines
– API/CLI integration
• Allow near real-time metadata-driven feedback to experiments
• Allow for insert points in the workflow – Ingest at point of collection – Catalog metadata and provenance – Push to data store – Push to local or external HPC
• Allow building and sharing of typed metadata definitions – e.g. build definition set that specifically
fits X-ray scattering data at your beamline
– Addresses problem of T, temp, Temp, temperature, temperature_kelvin, ...
12
![Page 13: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/13.jpg)
13
• Group data based on use and features, not location/filename – Logical grouping to organize, search, and
describe
• Operate on datasets as units • Tag datasets with characteristics
that reflect content • Share/move datasets for
collaboration • Interact with via REST API,
Python API, GUI, and CLI
Vs.
Globus Catalog Catalog à Datasets à Members
![Page 14: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/14.jpg)
Globus Catalog Web User Interface
![Page 15: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/15.jpg)
Near field-HEDM Workflow (Sharma, Almer)
15
** Supported by Data Engines for Big Data LDRD
(Wilde, Wozniak, Sharma, Almer, Blaiszik)
3: GenerateParameters
(FOP.c)50 tasks25s/task
¼ CPU hoursconcurrent
DetectorUp to 1000 datasets/week
Dataset360 files
4 GB total
1: Median calc75s (90% I/O)
MedianImage.cUses Swift
2: Peak Search15s per file
ImageProcessing.cUses Swift
ReducedDataset360 files
5 MB total
4: Analysis PassFitOrientation.c
105 tasks20s/task
555 CPU hoursthen
1m/task 1667 CPU hours
concurrent
real-time overnight or real-time
feedback to experiment
Up to2.2 M CPU hours
per week!
real time: 4/4/2014
On
Orth
ros
Experimenting “in the data dark” • Feedback during each experiment was non-existent • Required months to calculate relevant information for
publication OR to find out experiment was corrupted • Now, initial feedback over lunch using (Globus, SWIFT,
and Catalog) to leverage HPC and track metadata
![Page 16: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/16.jpg)
Globus Data Publication • Operated as a hosted
service
• Designed for Big Data
• Bring your own (per collection) storage
• Extensible metadata schemas and input forms
• Customizable publication and curation workflows
• Associate unique and persistent digital identifiers with datasets
• Rich discovery model (in dev)
16
Curator reviews and approves; data set published on campus or other
Researcher assembles data set; describes it using metadata (Dublin core and domain-specific) Peers and public
search and discover data sets; access and transfer using
publisheddatastore
curator
researcher
![Page 17: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/17.jpg)
Data Publication Dashboard
17
![Page 18: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/18.jpg)
Start a New Submission
18
Policies at the Collection Level • Required metadata, schemas • Data storage location • Metadata curation policies
![Page 19: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/19.jpg)
Describe Submission: Dublin Core
19
• Scientist or representative describes the data they are submitting
• For this collection
Dublin Core and a collection metadata template are required
![Page 20: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/20.jpg)
Describe submission: 2) Scientific metadata
20
• Scientist or representative describes the data they are submitting
• For this collection
Dublin Core and a collection metadata template are required
![Page 21: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/21.jpg)
Assemble the dataset
21
![Page 22: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/22.jpg)
Transfer Files to Submission Endpoint
22
• Scientist transfers dataset files to a unique publish endpoint
• Endpoint is created
on collection-specified data store
• Dataset may be
assembled over any period of time
• When submission is finished, dataset will be rendered immutable via checksum
![Page 23: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/23.jpg)
Check Dataset Assembly
23
• Verify size, file names, etc
• System attempts to
determine file types
• Scientist can choose to edit, remove, or add more files
• Scientist then accepts the collection-specified license and completes the submission (not pictured)
![Page 24: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/24.jpg)
DOI Assignment
![Page 25: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/25.jpg)
Submission Curation
25
If configured, a curator can approve the submission, reject, or edit metadata
![Page 26: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/26.jpg)
Discover a Published Dataset
26
• Search on ranged meta-data
• Link back to the
published dataset
![Page 27: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/27.jpg)
View Downloaded Dataset
27
Use Globus Connect Personal to pull the files locally for analysis
![Page 28: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/28.jpg)
...all of this via SaaS and with your own (institutional or personal) resources or cloud resources
Summary
Transfer
User Authentication Groups Sharing
Data Publication
Data Cataloging
Automation and Workflows
![Page 29: 20150126-Globus-USUK-Data-Workshop-Blaiszik-final](https://reader030.fdocuments.in/reader030/viewer/2022032513/55d16b0bbb61ebde118b46b6/html5/thumbnails/29.jpg)
Thank you to our sponsors! U.S. DEPARTMENT OF
ENERGY
Data Engines for Big Data LDRD